42 research outputs found
Discovering Latent Information By Spreading Activation Algorithm For Document Retrieval
Syntactic search relies on keywords contained in a query to find suitable
documents. So, documents that do not contain the keywords but contain
information related to the query are not retrieved. Spreading activation is an
algorithm for finding latent information in a query by exploiting relations
between nodes in an associative network or semantic network. However, the
classical spreading activation algorithm uses all relations of a node in the
network that will add unsuitable information into the query. In this paper, we
propose a novel approach for semantic text search, called
query-oriented-constrained spreading activation that only uses relations
relating to the content of the query to find really related information.
Experiments on a benchmark dataset show that, in terms of the MAP measure, our
search engine is 18.9% and 43.8% respectively better than the syntactic search
and the search using the classical constrained spreading activation.
KEYWORDS: Information Retrieval, Ontology, Semantic Search, Spreading
ActivationComment: 12pages, will be published in The International Journal of Artificial
Intelligence & Applications (IJAIA). arXiv admin note: text overlap with
arXiv:1807.0796
A Similarity Measure for Weaving Patterns in Textiles
We propose a novel approach for measuring the similarity between weaving
patterns that can provide similarity-based search functionality for textile
archives. We represent textile structures using hypergraphs and extract
multisets of k-neighborhoods from these graphs. The resulting multisets are
then compared using Jaccard coefficients, Hamming distances, and cosine
measures. We evaluate the different variants of our similarity measure
experimentally, showing that it can be implemented efficiently and illustrating
its quality using it to cluster and query a data set containing more than a
thousand textile samples.Comment: 10 papes, will be published in SIGIR 201
Semantic Search by Latent Ontological Features
Both named entities and keywords are important in defining the content of a
text in which they occur. In particular, people often use named entities in
information search. However, named entities have ontological features, namely,
their aliases, classes, and identifiers, which are hidden from their textual
appearance. We propose ontology-based extensions of the traditional Vector
Space Model that explore different combinations of those latent ontological
features with keywords for text retrieval. Our experiments on benchmark
datasets show better search quality of the proposed models as compared to the
purely keyword-based model, and their advantages for both text retrieval and
representation of documents and queries.Comment: 17 pages, Accept by New Generation Computing (2012
Combining Named Entities with WordNet and Using Query-Oriented Spreading Activation for Semantic Text Search
Purely keyword-based text search is not satisfactory because named entities
and WordNet words are also important elements to define the content of a
document or a query in which they occur. Named entities have ontological
features, namely, their aliases, classes, and identifiers. Words in WordNet
also have ontological features, namely, their synonyms, hypernyms, hyponyms,
and senses. Those features of concepts may be hidden from their textual
appearance. Besides, there are related concepts that do not appear in a query,
but can bring out the meaning of the query if they are added. We propose an
ontology-based generalized Vector Space Model to semantic text search. It
exploits ontological features of named entities and WordNet words, and develops
a query-oriented spreading activation algorithm to expand queries. In addition,
it combines and utilizes advantages of different ontologies for semantic
annotation and searching. Experiments on a benchmark dataset show that, in
terms of the MAP measure, our model is 42.5% better than the purely
keyword-based model, and 32.3% and 15.9% respectively better than the ones
using only WordNet or named entities.
Keywords: semantic search, spreading activation, ontology, named entity,
WordNet.Comment: 6 papes, Accepted by RIVF. arXiv admin note: substantial text overlap
with arXiv:1807.05579; text overlap with arXiv:1807.0557
WordNet-Based Information Retrieval Using Common Hypernyms and Combined Features
Text search based on lexical matching of keywords is not satisfactory due to
polysemous and synonymous words. Semantic search that exploits word meanings,
in general, improves search performance. In this paper, we survey WordNet-based
information retrieval systems, which employ a word sense disambiguation method
to process queries and documents. The problem is that in many cases a word has
more than one possible direct sense, and picking only one of them may give a
wrong sense for the word. Moreover, the previous systems use only word forms to
represent word senses and their hypernyms. We propose a novel approach that
uses the most specific common hypernym of the remaining undisambiguated
multi-senses of a word, as well as combined WordNet features to represent word
meanings. Experiments on a benchmark dataset show that, in terms of the MAP
measure, our search engine is 17.7% better than the lexical search, and at
least 9.4% better than all surveyed search systems using WordNet.
Keywords Ontology, word sense disambiguation, semantic annotation, semantic
search.Comment: 6pages, Will be in proceedings of the 5th International Conference on
Intelligent Computing and Information Systems (ICICIS-2011), in cooperation
with ACM. 30 June to 3 July, 2011, Cairo, Egyp
Designing and Implementing Data Warehouse for Agricultural Big Data
In recent years, precision agriculture that uses modern information and
communication technologies is becoming very popular. Raw and semi-processed
agricultural data are usually collected through various sources, such as:
Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm
equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are
very large, complex, unstructured, heterogeneous, non-standardized, and
inconsistent. Hence, the agricultural data mining is considered as Big Data
application in terms of volume, variety, velocity and veracity. It is a key
foundation to establishing a crop intelligence platform, which will enable
resource efficient agronomy decision making and recommendations. In this paper,
we designed and implemented a continental level agricultural data warehouse by
combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1)
flexible schema; (2) data integration from real agricultural multi datasets;
(3) data science and business intelligent support; (4) high performance; (5)
high storage; (6) security; (7) governance and monitoring; (8) replication and
recovery; (9) consistency, availability and partition tolerant; (10)
distributed and cloud deployment. We also evaluate the performance of our data
warehouse.Comment: Business intelligent, data warehouse, constellation schema, Big Data,
precision agricultur
Exploring Combinations of Ontological Features and Keywords for Text Retrieval
Named entities have been considered and combined with keywords to enhance
information retrieval performance. However, there is not yet a formal and
complete model that takes into account entity names, classes, and identifiers
together. Our work explores various adaptations of the traditional Vector Space
Model that combine different ontological features with keywords, and in
different ways. It shows better performance of the proposed models as compared
to the keyword-based Lucene, and their advantages for both text retrieval and
representation of documents and queries.Comment: 10 pages, will be in PRICAI. arXiv admin note: substantial text
overlap with arXiv:1807.0557
Discovering Latent Concepts and Exploiting Ontological Features for Semantic Text Search
Named entities and WordNet words are important in defining the content of a
text in which they occur. Named entities have ontological features, namely,
their aliases, classes, and identifiers. WordNet words also have ontological
features, namely, their synonyms, hypernyms, hyponyms, and senses. Those
features of concepts may be hidden from their textual appearance. Besides,
there are related concepts that do not appear in a query, but can bring out the
meaning of the query if they are added. The traditional constrained spreading
activation algorithms use all relations of a node in the network that will add
unsuitable information into the query. Meanwhile, we only use relations
represented in the query. We propose an ontology-based generalized Vector Space
Model to semantic text search. It discovers relevant latent concepts in a query
by relation constrained spreading activation. Besides, to represent a word
having more than one possible direct sense, it combines the most specific
common hypernym of the remaining undisambiguated multi-senses with the form of
the word. Experiments on a benchmark dataset in terms of the MAP measure for
the retrieval performance show that our model is 41.9% and 29.3% better than
the purely keyword-based model and the traditional constrained spreading
activation model, respectively.Comment: 9 pages - accpted by the 5th International Joint Conference on
Natural Language Processing (IJCNLP-2011). arXiv admin note: text overlap
with arXiv:1807.0557
An Efficient Data Warehouse for Crop Yield Prediction
Nowadays, precision agriculture combined with modern information and
communications technologies, is becoming more common in agricultural activities
such as automated irrigation systems, precision planting, variable rate
applications of nutrients and pesticides, and agricultural decision support
systems. In the latter, crop management data analysis, based on machine
learning and data mining, focuses mainly on how to efficiently forecast and
improve crop yield. In recent years, raw and semi-processed agricultural data
are usually collected using sensors, robots, satellites, weather stations, farm
equipment, farmers and agribusinesses while the Internet of Things (IoT) should
deliver the promise of wirelessly connecting objects and devices in the
agricultural ecosystem. Agricultural data typically captures information about
farming entities and operations. Every farming entity encapsulates an
individual farming concept, such as field, crop, seed, soil, temperature,
humidity, pest, and weed. Agricultural datasets are spatial, temporal, complex,
heterogeneous, non-standardized, and very large. In particular, agricultural
data is considered as Big Data in terms of volume, variety, velocity and
veracity. Designing and developing a data warehouse for precision agriculture
is a key foundation for establishing a crop intelligence platform, which will
enable resource efficient agronomy decision making and recommendations. Some of
the requirements for such an agricultural data warehouse are privacy, security,
and real-time access among its stakeholders (e.g., farmers, farm equipment
manufacturers, agribusinesses, co-operative societies, customers and possibly
Government agencies). However, currently there are very few reports in the
literature that focus on the design of efficient data warehouses with the view
of enabling Agricultural Big Data analysis and data mining. In this paper ...Comment: 12 pages. Keywords. Data warehouse, constellation schema, crop yield
prediction, precision agricultur
A Generalized Vector Space Model for Ontology-Based Information Retrieval
Named entities (NE) are objects that are referred to by names such as people,
organizations and locations. Named entities and keywords are important to the
meaning of a document. We propose a generalized vector space model that
combines named entities and keywords. In the model, we take into account
different ontological features of named entities, namely, aliases, classes and
identifiers. Moreover, we use entity classes to represent the latent
information of interrogative words in Wh-queries, which are ignored in
traditional keyword-based searching. We have implemented and tested the
proposed model on a TREC dataset, as presented and discussed in the paper.Comment: 5 pages, in Vietnamese. information retrieval, vector space model,
ontology, named entity, keyword. Accepted by Vietnamese Journal on
Information Technologies and Communication